Method based on EM algorithm for estimating word translation probabilities in Thai – English machine translation
نویسندگان
چکیده
Selecting the word translation from a set of target language words, one that conveys the correct sense of source word and makes more fluent target language output, is one of core problems in machine translation. In this paper we compare the 3 methods of estimating word translation probabilities for selecting the word translation in Thai – English Machine Translation. The 3 methods are (1) Method based on frequency of word translation (2) Method based on collocation of word translation, and (3) Method based on Expectation Maximization (EM) algorithm. For evaluation we used Thai – English parallel sentences generated by NECTEC. The method based on EM algorithm is the best method in comparison to the other methods and gives satisfying results. Key-Words: Machine translation, EM algorithm
منابع مشابه
Estimating Word Translation Probabilities for Thai – English Machine Translation using EM Algorithm
Selecting the word translation from a set of target language words, one that conveys the correct sense of source word and makes more fluent target language output, is one of core problems in machine translation. In this paper we compare the 3 methods of estimating word translation probabilities for selecting the translation word in Thai – English Machine Translation. The 3 methods are (1) Metho...
متن کاملEstimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm
Selecting the right word translation among several op tions in the lexicon is a core problem for machine trans lation We present a novel approach to this problem that can be trained using only unrelated monolingual corpora and a lexicon By estimating word translation probabilities using the EM algorithm we extend upon target language modeling We construct a word trans lation model for German an...
متن کاملA Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملUsing Comparable Corpora to Adapt a Translation Model to Domains
Statistical machine translation (SMT) requires a large parallel corpus, which is available only for restricted language pairs and domains. To expand the language pairs and domains to which SMT is applicable, we created a method for estimating translation pseudo-probabilities from bilingual comparable corpora. The essence of our method is to calculate pairwise correlations between the words asso...
متن کاملImprovement of Statistical Machine Translation using Charater-Based Segmentationwith Monolingual and Bilingual Information
We present a novel segmentation approach for Phrase-Based Statistical Machine Translation (PB-SMT) to languages where word boundaries are not obviously marked by using both monolingual and bilingual information and demonstrate that (1) unsegmented corpus is able to provide the nearly identical result compares to manually segmented corpus in PB-SMT task when a good heuristic character clustering...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007